Search CORE

44 research outputs found

Enhanced Storytimes: Effects on Parent/Caregiver Knowledge, Motivation, and Behaviors

Author: Bailey-White Stephanie
Compton Erica
Ghoting Saroj
Shaw Staci
Stewart Roger A.
Publication venue: 'IUScholarWorks'
Publication date: 01/07/2014
Field of study

The article offers information regarding the released of the Every Child Ready @ your library initiative\u27s second edition of the Public Library Association and the Association for Library Service to Children in 2011. It states that the initiative features five practices based in high-quality oral language development in children such as reading, writing, and talking. It mentions that the initiative will help children in early literacy development and will educate caregivers and parents

Boise State University - ScholarWorks

Tree model guided candidate generation for mining frequent subtrees from XML

Author: Abe K.
Agrawal R.
Chi Y.
Elizabeth Chang
Fedja Hadzic
Feng L.
Ghoting A.
Henry Tan
Ling Feng
Nijssen S.
Sidhu A. S.
Suciu D.
Tan H.
Tan H.
Tan H.
Termier A.
Tharam S. Dillon
Wang C.
Xiao Y.
Yan X.
Yang L. H.
Zhang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly coneerned with mining frequent induced and embedded ordered subtrees. Our main contributions arc as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation ofour Tree Model Guided (TMG) candidate generation. TMG is an optimal, non-redundant enumeration strategy which enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this paper, we propose two algorithms, MB3Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well known algorithms for mining induced and embedded subtrees, demonstrate the effeetiveness and the efficiency of the proposed techniques

Crossref

espace@Curtin

Towards NIC-based intrusion detection

Author: A. Ghoting
D. Panda
G. Li
M. Otey
S. Narravula
S. Parthasarathy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Introduction to Topic 5: Parallel and Distributed Data Management

Author: A. Ghoting
G. Antoniu
M. S. Pérez-Hernández
S. Orlando
Publication venue: Springer
Publication date
Field of study

Archivio Ricerca Ca'Foscari

Introduction to Topic 5: Parallel and Distributed Data Management

Author: A. Ghoting
G. Antoniu
M. S. Pérez-Hernández
S. Orlando
Publication venue: Springer
Publication date
Field of study

A Distributed Approach to Detect Outliers in Very Large Data Sets

Author: A. Ghoting
E. Hung
F. Angiulli
F. Angiulli
J. Han
M.E. Otey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

A Distributed Approach to Detect Outliers in Very Large Data Sets

Author: A. Ghoting
E. Hung
F. Angiulli
F. Angiulli
J. Han
M.E. Otey
Publication venue: place:BERLIN HEIDELBERG
Publication date: 01/01/2010
Field of study

We propose a distributed approach addressing the problem of distance-based outlier detection in very large data sets. The presented algorithm is based on the concept of outlier detection solving set ([1]), which is a small subset of the data set that can be provably used for predicting novel outliers. The algorithm exploits parallel computation in order to meet two basic needs: (i) the reduction of the run time with respect to the centralized version and (ii) the ability to deal with distributed data sets. The former goal is achieved by decomposing the overall computation into cooperating parallel tasks. Other than preserving the correctness of the result, the proposed schema exhibited excellent performances. As a matter of fact, experimental results showed that the run time scales up with respect to the number of nodes. The latter goal is accomplished through executing each of these parallel tasks only on a portion of the entire data set, so that the proposed algorithm is suitable to be used over distributed data sets. Importantly, while solving the distance-based outlier detection task in the distributed scenario, our method computes an outlier detection solving set of the overall data set of the same quality as that computed by the corresponding centralized method

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Design of a next generation sampling service for large scale data analysis applications

Author: A. Ghoting
G. Buehrer
H. Wang
J. Saltz
S. Parthasarathy
S. Tatikonda
T. Kurc
Publication venue: ACM Press
Publication date: 01/01/2005
Field of study

Advances in data collection and storage technologies have resulted in large and dynamically growing data sets at many organizations. Database and data mining researchers often use sampling with great effect to scale up performance on these data sets with small cost to accuracy. However, existing techniques often ignore the cost of computing a sample. This cost is often linear in the size of the data set, not the sample, which is expensive. Furthermore, for data mining applications that leverage progressive sampling or bootstrapping-based techniques, this cost can be prohibitive, since they require the generation of multiple samples. To address this problem, we present a solution in the context of a state-of-the-art data analysis center. Specifically, we propose a scalable service that supports sample generatio

CiteSeerX

Crossref